Scaling dbt and BigQuery to infinity and beyond from Coalesce 2023

Team members at Bluecore, Adam Whitaker, analytics lead, and Nicole Dallar-Malburg, analytics engineer, discuss how their team scaled their data warehouse.

"So, overcoming technical challenges was only half of this. We needed to make sure that we didn't bankrupt the company as we went through this whole process."

- Adam Whitaker, analytics lead at Bluecore

Team members at Bluecore, Adam Whitaker, analytics lead, and Nicole Dallar-Malburg, analytics engineer, discuss how their team scaled their data warehouse and the lessons they learned. They also shared their next steps in data warehouse maturity.

The use of dbt and macros for scalable data analytics

The Bluecore team used dbt to manage the analytics of their large retail marketing platform. To do so, they created a system that could accommodate a vast amount of data by creating macros that could be reused in multiple model files.

"Each model has to contain the proper materialization tagging for orchestration purposes and labeling for cost tracking," says Adam. They leveraged dbt to create a consistent and dependable data set, using macros to reduce redundancy and maintain high-quality transformations.

"We've got two different flavors. We have entire model macros and specific field macros where the model macros contain all the transformation logic in a single place," he explains. Using these macros allowed them to create a system that could scale up to handle their extensive data needs.

Overcoming technical challenges and optimizing cost management

While scaling up their system, Bluecore encountered several technical challenges. They had to deal with BigQuery's concurrent query limits, dbt Cloud's memory limits, and long project parse times. To manage these issues, they created multiple execution projects, reduced the model count in each dbt Cloud job, and explored alternate orchestration methods.

"Going into this, we knew that ‘one table equals one model’ would be a problem for us," says Adam. They also had to consider cost management, reducing slot usage by 90% and run times by 50%. They found that the cost to rebuild all of their data on every run was too high and had to consider how much history they needed to restate on each run and what their actual freshness needs were.

Valuable lessons learned and future steps

Scaling their data warehouse taught the Bluecore team several valuable lessons. They learned the importance of aligning with business users on metric definitions and establishing data contracts with upstream data producers. They also realized that infrastructure engineering and scaling require different skills than pure analytics engineering work.

"We also learned some lessons within our team in regard to technical tradeoffs," says Nicole. They opted to use incremental models in exchange for capturing late-arriving data, even though it added complexity to their data pipeline.

Looking forward, Bluecore’s team plans to continue to grow their footprint to capture more use cases and data sources. They also aim to reduce parse time, enable contributors from outside of data under the federated analyst model, and add observability to their data warehouse.

Insights surfaced

  • They used dbt as the global transform layer to create a single source of truth for all of their data products
  • They faced challenges in scaling their data warehouse, including Big Query's limit of 100 concurrent queries per project and long queue times inside their jobs
  • They overcame these challenges by creating multiple execution projects, reducing the model count in each dbt Cloud job, and leveraging Python scripts for adding and removing models at scale
  • They used incremental models and merge strategy for their data warehouse to manage costs and meet their service level agreement
  • They are exploring alternate orchestration methods to enable partial parsing and reduce project parse time

Related Articles

Register for Coalesce 2024

Join us in-person or online for the largest analytics engineering conference. Level-up your skillset, expand your network, and build your path at Coalesce 2024.